Skip to content

Conversation

leisuzz
Copy link
Contributor

@leisuzz leisuzz commented Aug 26, 2025

What does this PR do?

  1. When I use the accelerate with deepspeed zero2, I found the error:
    "ValueError: At least one of the dataloaders passed to accelerate.prepare() has None as batch size. Please set an integer value in train_micro_batch_size_per_gpu in the deepspeed config file or assign integer value to AcceleratorState().deepspeed_plugin.deepspeed_config['train_micro_batch_size_per_gpu']."

I think the current batch_sampler does not support deepspeed. So I added this function to support it.

  1. I've added the function to support saving checkpoints when using deepspeed

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@leisuzz
Copy link
Contributor Author

leisuzz commented Aug 26, 2025

Cc: @sayakpaul @a-r-r-o-w Please take a look at this PR. Thanks!

@leisuzz leisuzz force-pushed the kontext branch 3 times, most recently from 3950d2e to 3094c2b Compare August 26, 2025 09:01
@leisuzz
Copy link
Contributor Author

leisuzz commented Aug 26, 2025

@sayakpaul I've also added ds support for load/save ckpt function. Thanks!

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot!

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@leisuzz
Copy link
Contributor Author

leisuzz commented Aug 28, 2025

@sayakpaul Please approve the workflow, thx :)

@leisuzz
Copy link
Contributor Author

leisuzz commented Aug 28, 2025

Hi @sayakpaul, It looks weird but I think the original code does not need the empty line. I added the empty line to it. Thanks :)

@leisuzz
Copy link
Contributor Author

leisuzz commented Aug 29, 2025

Cc: @sayakpaul, I checked with ruff and make quality, they all passed. Sorry for the trouble and thanks for your help :)

@leisuzz
Copy link
Contributor Author

leisuzz commented Sep 1, 2025

Cc: @sayakpaul @a-r-r-o-w

@sayakpaul
Copy link
Member

@leisuzz could I request you to not ping the maintainers multiple times on a PR? This has happened in the past, so, I thought I would bring it up.

@leisuzz
Copy link
Contributor Author

leisuzz commented Sep 1, 2025

@sayakpaul Sorry about that

@leisuzz
Copy link
Contributor Author

leisuzz commented Sep 5, 2025

@leisuzz could I request you to not ping the maintainers multiple times on a PR? This has happened in the past, so, I thought I would bring it up.

Hi @sayakpaul,
Sorry to ping you again, but it's been a week and I this PR is kind of important.
Thanks for your time and help!

@leisuzz leisuzz requested a review from sayakpaul September 8, 2025 00:55
@leisuzz
Copy link
Contributor Author

leisuzz commented Sep 9, 2025

Hi @sayakpaul, I think the failing tests are unrelated. Thank you for your help :)

@sayakpaul sayakpaul merged commit c222570 into huggingface:main Sep 9, 2025
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants